PLI-VINS: Visual-Inertial SLAM Based on Point-Line Feature Fusion in Indoor Environment

In indoor low-texture environments, the point feature-based visual SLAM system has poor robustness and low trajectory accuracy. Therefore, we propose a visual inertial SLAM algorithm based on point-line feature fusion. Firstly, in order to improve the quality of the extracted line segment, a line segment extraction algorithm with adaptive threshold value is proposed. By constructing the adjacent matrix of the line segment and judging the direction of the line segment, it can decide whether to merge or eliminate other line segments. At the same time, geometric constraint line feature matching is considered to improve the efficiency of processing line features. Compared with the traditional algorithm, the processing efficiency of our proposed method is greatly improved. Then, point, line, and inertial data are effectively fused in a sliding window to achieve high-accuracy pose estimation. Finally, experiments on the EuRoC dataset show that the proposed PLI-VINS performs better than the traditional visual inertial SLAM system using point features and point line features.


Introduction
In recent years, simultaneous localization and mapping (SLAM) has developed into a research hotspot in the field of mobile robots. It is considered to be the core link to achieve autonomous navigation. SLAM includes two main tasks, namely positioning and mapping robots in an unknown environment; the pose of the robot is obtained by detecting the surrounding features of the sensor during the movement process, and then the map of the environment is constructed from the robot's point of view [1,2].
Visual SLAM can be divided into direct methods and feature methods to estimate camera motion based on the obtained images. Direct methods, such as DTAM [3], LSD-SLAM [4] and DSO [5], estimate camera motion according to the pixel brightness information of the image, and optimize the solution by minimizing the brightness error. However, its prerequisite is based on the assumption of the invariant gray level, that is, the pixel gray level of the same spatial point is fixed in successive image frames. The point feature method mainly uses SIFT [6], ORB [7] or SURF [8] to extract and match point features. According to the result of feature matching, incremental beam adjustment is used to minimize the reprojection error to estimate the camera pose, such as PTAM [9] and ORB-SLAM2 [10].
However, the shortcomings of pure visual SLAM are also obvious. It is more sensitive to situations where the movement speed is too fast, the rotation is too intense, and the exposure is too high. The IMU inertial measurement unit can obtain more accurate motion information because the acquisition frequency is higher than that of the camera, but the IMU also has drift. The effective combination of the two can make up for the shortcomings of the visual degradation of the camera and the drift of the correction IMU, so as to provide better data information. To solve these problems, researchers combine vision and IMU and propose a number of tightly coupled visual-inertial SLAM methods that jointly optimize

•
In order to effectively obtain the structural information of indoor environments and process the environment with repeated texture, an adaptive threshold line segment extraction algorithm is proposed on the premise of point-line feature fusion, which is used to process various redundant line segments in indoor environments to improve the quality of line features.

•
The point feature, line feature and IMU information are effectively fused in an optimization-based sliding window to achieve high precision pose estimation. Experiments on EuRoC datasets [22] show that the algorithm presented in this paper has better performance than optical flow-based VINS-Mono and point-line based PL-VINS. In the remainder of this article, the architecture of the proposed approach is described in Section 2. Sections 3 and 4 describe in detail the work of the line segment extraction algorithm proposed in this paper in indoor environments and the effective utilization of point and line features and IMU in sliding windows. Section 5 describes the experimental setup and the experimental results under a common data set. Finally, Section 6 provides concluding observations and describes future work.

System Overview
The method proposed in this paper is mainly improved based on the VINS-Mono system. The designed system block diagram is shown in Figure 1, which is mainly divided into measurement preprocessing, local sliding window optimization and closed-loop detection. The initialization process adopts the same strategy as that of VINS-Mono. Based on the loose coupling strategy of visual information IMU information, the pose of all frames in the sliding window and the inverse depth of 3D points are estimated by using the pure visual structure from motion (SFM), and finally aligned with the result of IMU pre-integration. The initialization parameters are solved.

•
The point feature, line feature and IMU information are effectively fused in an optimization-based sliding window to achieve high precision pose estimation. Experiments on EuRoC datasets [22] show that the algorithm presented in this paper has better performance than optical flow-based VINS-Mono and point-line based PL-VINS.
In the remainder of this article, the architecture of the proposed approach is described in Section 2. Sections 3 and 4 describe in detail the work of the line segment extraction algorithm proposed in this paper in indoor environments and the effective utilization of point and line features and IMU in sliding windows. Section 5 describes the experimental setup and the experimental results under a common data set. Finally, Section 6 provides concluding observations and describes future work.

System Overview
The method proposed in this paper is mainly improved based on the VINS-Mono system. The designed system block diagram is shown in Figure 1, which is mainly divided into measurement preprocessing, local sliding window optimization and closed-loop detection. The initialization process adopts the same strategy as that of VINS-Mono. Based on the loose coupling strategy of visual information IMU information, the pose of all frames in the sliding window and the inverse depth of 3D points are estimated by using the pure visual structure from motion (SFM), and finally aligned with the result of IMU pre-integration. The initialization parameters are solved.  For feature extraction and tracking, one must firstly extract the Shi-Tomasi [23] feature points from the input images, and then match and track feature points based on the optical flow method. For line features, the proposed adaptive threshold line segment extraction algorithm is used to extract line segments, LBD [24] descriptors are calculated, and the KNN [25] algorithm is used to match the distance between descriptor and line segment angle. This process is described in detail in Section 3 of this article. For feature extraction and tracking, one must firstly extract the Shi-Tomasi [23] feature points from the input images, and then match and track feature points based on the optical flow method. For line features, the proposed adaptive threshold line segment extraction algorithm is used to extract line segments, LBD [24] descriptors are calculated, and the KNN [25] algorithm is used to match the distance between descriptor and line segment angle. This process is described in detail in Section 3 of this article.
After system initialization, the point-line feature results are sent to the sliding window optimization section, and the sliding window optimization will pre-integrate IMU data. The nonlinear estimator based on the sliding window model can construct the joint optimization function according to the point-line constraints, IMU constraints and loopback constraints, and solve the position, velocity, rotation and bias of all frames in the sliding window. The detailed contents will be introduced in the third and fourth chapters of this paper.
In the loop detection part, we follow the strategy of VINS-Mono. Firstly, whether to insert key frames is determined according to the parallax between the two frames. If a key frame is inserted, loop detection is performed through the DBoW [26] word bag model and BRIEF [27] descriptor. If there is a loopback, the relocation process is used to maintain alignment between the current the sliding window and the poses map of the past time, and all the poses of the loopback is taken as a constant, and all the IMU measurements, local visual measurements and corresponding feature values extracted from the loopback are used to optimize the sliding window, so as to reduce the cumulative error and calculation amount of the system. However, visual inertia information can provide roll angle and pitch angle data, so there are only four degrees-of-freedom (DOF) errors (triaxial position error and heading angle error). The consistency of the global trajectory can be guaranteed only by adding key frames to the bitmap and optimizing its 4DOF.

Point Line Feature Processing
For point features in indoor environments, the Shi-Tomasi algorithm is used to detect corner points in this paper, and then the KLT optical flow algorithm [28] is used to track and match feature points, and RANSAC-based pair geometric constraints [29] are used to identify internal and external points and eliminate outliers. For line features in indoor scenes, an adaptive threshold line segment extraction algorithm is proposed to process line features. Subsequently, LBD and KNN were used to describe and match the line features, and the existing line feature outliers were identified by matching the Hamming distance and angle of the line segment. Figure 2 shows the comparison between the traditional LSD and KLT optical flow and the proposed algorithm in the EuRoC datasets factory scenario.
After system initialization, the point-line feature results are sent to the sliding window optimization section, and the sliding window optimization will pre-integrate IMU data. The nonlinear estimator based on the sliding window model can construct the joint optimization function according to the point-line constraints, IMU constraints and loopback constraints, and solve the position, velocity, rotation and bias of all frames in the sliding window. The detailed contents will be introduced in the third and fourth chapters of this paper.
In the loop detection part, we follow the strategy of VINS-Mono. Firstly, whether to insert key frames is determined according to the parallax between the two frames. If a key frame is inserted, loop detection is performed through the DBoW [26] word bag model and BRIEF [27] descriptor. If there is a loopback, the relocation process is used to maintain alignment between the current the sliding window and the poses map of the past time, and all the poses of the loopback is taken as a constant, and all the IMU measurements, local visual measurements and corresponding feature values extracted from the loopback are used to optimize the sliding window, so as to reduce the cumulative error and calculation amount of the system. However, visual inertia information can provide roll angle and pitch angle data, so there are only four degrees-of-freedom (DOF) errors (triaxial position error and heading angle error). The consistency of the global trajectory can be guaranteed only by adding key frames to the bitmap and optimizing its 4DOF.

Point Line Feature Processing
For point features in indoor environments, the Shi-Tomasi algorithm is used to detect corner points in this paper, and then the KLT optical flow algorithm [28] is used to track and match feature points, and RANSAC-based pair geometric constraints [29] are used to identify internal and external points and eliminate outliers. For line features in indoor scenes, an adaptive threshold line segment extraction algorithm is proposed to process line features. Subsequently, LBD and KNN were used to describe and match the line features, and the existing line feature outliers were identified by matching the Hamming distance and angle of the line segment. Figure 2 shows the comparison between the traditional LSD and KLT optical flow and the proposed algorithm in the EuRoC datasets factory scenario.

Adaptive Threshold Line Segment Extraction Algorithm
When the traditional LSD algorithm is used in structural scenes, it is easy to produce many short, overlapping and overlapping line segments. As shown in Figure 3b,c, these line segments easily cause matching difficulties, resulting in the decrease in the rate and accuracy of camera pose estimation. We propose an adaptive threshold line segment extraction algorithm, which merges and removes the above-mentioned line segments to further reduce redundant matching and mismatching of line features, thus improving the robustness and accuracy of the proposed algorithm.

Adaptive Threshold Line Segment Extraction Algorithm
When the traditional LSD algorithm is used in structural scenes, it is easy to produce many short, overlapping and overlapping line segments. As shown in Figure 3b,c, these line segments easily cause matching difficulties, resulting in the decrease in the rate and accuracy of camera pose estimation. We propose an adaptive threshold line segment extraction algorithm, which merges and removes the above-mentioned line segments to further reduce redundant matching and mismatching of line features, thus improving the robustness and accuracy of the proposed algorithm.
Firstly, length screening was carried out for the set {l 1 , l 2 , · · · l N } of all line segments extracted by the traditional LSD algorithm; the short line segment whose len l i is less than the length threshold len min is eliminated. The short lines that have great influence on attitude estimation can be deleted by length screening. The length threshold len min satisfies the following formula: where N is the number of line features extracted from the image of frame K; W k and H k are the width and height of the current k frame; • means round up.
where N is the number of line features extracted from the image of frame K ; k W and k H are the width and height of the current k frame; •     means round up.
In the case of the three common line segments as shown in Figure    In the case of the three common line segments as shown in Figure 4, this paper constructs the external matrix of the line segment l i after length screening, and determines whether there are heads, tails and midpoint endpoints of other adjacent line segments in the external matrix area. Then, the line segment features that meet the conditions are added to the same set l i , l i 1 , l i 2 , · · · l i n . In Figure 4b, no endpoint is located in the external matrix to be eliminated. Since each line segment in the set is characterized by a known starting point and ending point, the main direction A=angle of the vector in the image coordinate system can be calculated. As shown in Figure 4a,c, the main directions of line segment l i and other line segments l i n in the set were calculated and the average value was taken as the angle threshold ang min , and then the features of line segments whose angles with line segment l i were greater than the angle threshold ang min were eliminated. Finally, all line segments that meet the conditions are extracted from the beginning and end and the midpoint and end points, respectively, and the line segment is fitted to the point set by the least square method.  Firstly, length screening was carried out for the set { } 1 2 , , N l l l  of all line segments extracted by the traditional LSD algorithm; the short line segment whose i l len is less than the length threshold min len is eliminated. The short lines that have great influence on attitude estimation can be deleted by length screening. The length threshold min len satisfies the following formula: where N is the number of line features extracted from the image of frame K ; k W and k H are the width and height of the current k frame; •     means round up.
In the case of the three common line segments as shown in Figure    Compared with the single threshold set by experience in the paper [20,[30][31][32], the threshold set in this paper is associated with the number of line segments extracted, image size and scene, which can more effectively adapt to the impact of different indoor scene changes.
LBD descriptors were extracted from the filtered line segments for subsequent feature matching. The KNN algorithm is then used for line segment matching. If the matching distance and angle are less than the threshold value, the matching is considered successful.

Triangulation of Space Line Segments
Using homogeneous coordinates to determine a straight line through two points will generate redundant parameters, which will bring additional computational costs in subsequent optimization. Therefore, this paper introduces Plücker coordinates to represent the straight line. The Plücker coordinate is determined by two different points on the line L W . If one sets straight L W two endpoints of homogeneous coordinates of p 1 [x 1 , x 2 , x 3 , x 4 ] T and p 2 [y 1 , y 2 , y 3 , y 4 ] T , the straight line L W Plücker coordinates are expressed as follows: where [•] w represents the coordinates of feature points or feature line segments in the world coordinate system; p 1 and p 2 are Cartesian coordinate representations of p 1 and p 2 , respectively; n w ∈ R 3 is the normal vector of line L W ; v w ∈ R 3 is the direction vector of line L W .
The relationship between Plücker matrix T and Plücker coordinates can be obtained as follows: where n ∧ w is the antisymmetric matrix of n w . If one allows the transformation matrix of line L W from the world coordinate system to the camera coordinate system be H cw , then H cw is as follows: where R cw and t cw represent the rotation matrix and translation vector of line L W transformed from the world coordinate system to the camera coordinate system. L c is the coordinate of line L W transformed from the world coordinate system to the camera coordinate system in space, so the formula of Plücker coordinate when representing the coordinate change in line L W is as follows: Space line L c projection to the plane of projection equations expressed by L 1 , L 1 as follows: where κ is the projection matrix of line features. It can be observed from the above that the Plücker coordinate is an expression form of six parameters, and there are excessive parameterization and orthogonal constraints, which will still cause unnecessary calculations in the optimization process. In this regard, Bartoli [33] proposed a four-parameter orthogonal representation to address the above problems, and this work is adopted in this paper.
Through the QR decomposition of the Plücker line coordinate L W = n T w , v T w T , its orthogonal representation (U, W) ∈ so(3) × so(2) can be obtained, where U and W are as follows: where U and W represent the three-dimensional and two-dimensional rotation matrices, respectively; θ is the rotation angle. Then, the Plücker line coordinate L w after orthogonal representation can be expressed as follows: where u i represents the ith column of matrix U.

Reprojection Error Model of Line Feature
As shown in Figure 5, the projection line segments of line L on the image plane are L 1 , and l 1 is the observation line segment. One must let the end points of l 1 segment X 1 = [x 1 , y 1 , 1] T and X 2 = [x 2 , y 2 , 1] T , and the projection segment L 1 = [l 1 , l 2 , l 3 ].
where U and W represent the three-dimensional and two-dimensional rotation matrices, respectively; θ is the rotation angle. Then, the Plücker line coordinate ' w L after orthogonal representation can be expressed as follows: where i u represents the th i column of matrix U .

Reprojection Error Model of Line Feature
As shown in Figure 5,  Then, the distance between the two endpoints and the projected line segment is as follows: The Jacobian matrix of the camera pose increment can be solved according to the chain rule, which is as follows: L 1 and L c can be obtained from Equations (5) and (6), and the three items on the right of Equation (10) are as follows:

Nonlinear Optimization Based on Sliding Window
In this paper, the nonlinear optimization method based on the sliding window model is adopted, that is, to ensure that the number of optimization variables is maintained in a certain range, the optimization variables are dynamically added or removed through the sliding window, and only the key frame data in the current period of time participate in the position pose solution process. The complete state vector at moment i in the sliding window is defined as follows: where x i is IMU state vector at window i, p ωb i is position information, q ωb i is pose infor- are accelerometer bias and gyroscope bias, respectively; λ m represents the inverse depth of 3D points; O l is the orthogonal representation of line features in the world coordinate system; N is the number of key frames in the sliding window, m is the number of point features observed by key frames in the sliding window, and l is the number of line features observed by key frames in the sliding window.
On the basis of VINS-Mono, the residual term of line feature is added into the objective optimization function. That is, the objective optimization function includes marginal prior residual, IMU measurement residual, point and line residual. The specific form is as follows: where B is the IMU measurement data set, D and l are the collection of point features and line features observed at least twice in the image frame, respectively. r p − H p χ | 2 Σ p is the marginal prior information, H p is the marginal prior residual Jacobian matrix;

Results
To verify the effectiveness of the proposed visual inertial SLAM algorithm based on the fusion point and line features in indoor environments, experiments were carried out using EuRoC datasets. The dataset was collected by a micro aerial vehicle (UAV) at two different scales, industrial factory and indoor room. There are 11 sequences, including binocular stereo (752*480) images, 200 Hz synchronous IMU information, trajectory truth, and calibration files for external and internal parameters of different sensors. These sequences are classified into different levels based on lighting, texture, dynamic motion, or motion blur.
Firstly, this paper verifies the effectiveness of the proposed improved LSD algorithm in screening invalid line segments in indoor environments, especially in the efficiency of line segment extraction and matching. Then, the root mean square error (RMSE) of absolute trajectory error (ATE) is used to evaluate the effect of the improved LSD algorithm on improving the accuracy of camera pose tracking, and the effect of the nonlinear optimization algorithm with point and line residuals on the accuracy of camera motion trajectory.

Evaluation of Line Feature Extraction Algorithm
In this section, datasets numbered "MH_01_easy" and "MH_03_medium" are selected from the industrial factory environment. In the indoor room environment, select datasets V1_01_easy, V1_03_difficult, and V2_01_easy. Then ten groups of adjacent images were randomly selected from the above datasets for line feature extraction experiment. Figure 6 (a) shows the scenario of "MH_01_easy", "V1_01_easy" and "V2_01_easy"; (b) is the line segment graph extracted by the traditional LSD algorithm, in which there are a large number of short, crossed and overlapping line segments. In the calculation of  Figure 6c shows the fixed threshold method (line segment length > 60) adopted in PL-VINS. Compared with traditional LSD, it removes the most useless small line segments. The comparison of Figure 6c-e shows that the method adopted by PL-VINS also removes a large number of useful structural line segment features. As shown in Table 1, compared with the traditional LSD and PL-VINS methods, the extraction quantity of the PLI-VINS decreased significantly, and the average running time decreased by 58.5% and 25.6%.  By combining the data in Table 1 and the effect of Figure 6, it can be observed that many unstable short line segments can be screened out by the length factor, and then the adjacent, overlapping, and other line segments that repeatedly describe the same geometric feature type are merged through line segment merging. There are great improvements in efficient line segment representation in indoor scenes and in reducing algorithm running time.

Accuracy Evaluation of Pose Trajectories
In this subsection, the positioning accuracy analysis is performed on all sequences in the EuRoC datasets, and the PLI-VINS is compared with VINS-Mono, PL-VINS and PL-VIO, respectively. The absolute trajectory errors of different algorithms under the EuRoC datasets are shown in Table 2, where the values with the lowest errors are in bold. In Figure 7, this paper shows the accuracy heatmap of VINS-Mono and our algorithm in the sequence MH_03_medium, V1_01_easy, V2_01_easy; the gray dotted line represents the  By combining the data in Table 1 and the effect of Figure 6, it can be observed that many unstable short line segments can be screened out by the length factor, and then the adjacent, overlapping, and other line segments that repeatedly describe the same geometric feature type are merged through line segment merging. There are great improvements in efficient line segment representation in indoor scenes and in reducing algorithm running time.

Accuracy Evaluation of Pose Trajectories
In this subsection, the positioning accuracy analysis is performed on all sequences in the EuRoC datasets, and the PLI-VINS is compared with VINS-Mono, PL-VINS and PL-VIO, respectively. The absolute trajectory errors of different algorithms under the EuRoC datasets are shown in Table 2, where the values with the lowest errors are in bold. In Figure 7, this paper shows the accuracy heatmap of VINS-Mono and our algorithm in the sequence MH_03_medium, V1_01_easy, V2_01_easy; the gray dotted line represents the true value of the trajectory, and the colored solid line represents the estimated trajectory. The color of the trace changes from blue to red, indicating a gradual increase in the error of the ATE. Each line shows the results of five methods in the same data set, and the first two of each line are the trajectory of VINS-Mono with no loopback and with loopback. The third is the method track of length filtering only in this paper (no loopback), the fourth is the complete method track of this paper (no loopback), and the last is the track of our algorithm with loopback. By comparing the three groups of tracks in Figure 7, it can be observed that the proposed method shows better accuracy and stability in the area where the camera has a large rotation. At the same time, compared with the trajectory of VINS-Mono, the trajectory accuracy of the proposed PLI-VINS in V2_01_easy is improved by 40.9%, then it is improved by 53.7% in V1_01_easy, and finally reaches the highest 63.3% in MH_03_medium. The corresponding increases in PL-VINS were only 32.2%, 61.2% and 48.6%. At the same time, by observing the third and fourth track graphs of each group, it can be found that the proposed method performs well in different indoor environments. By properly merging the adjacent line segments to improve the quality of line segments again, the trajectory accuracy of the camera can be effectively improved. Combined with Table 1 and Figure 6 and the operation of the PLI-VINS in the three scenarios, it is not difficult to find that although V1_01_easy and V2_01_easy are indoor scenarios with relatively single environments and limited ability to describe the structural features of line features, this paper improves the quality of line segments by eliminating redundant line segments and merging lines segments; still achieved good trajectory accuracy. However, the MH_03_medium factory scene has a large number of good structural line segment features, which is very conducive to the PLI-VINS to improve the camera trajectory accuracy by using line features. It also shows that the proposed PLI-VINS performs well in various indoor environments.
In terms of root mean square error of absolute trajectory error, as shown in Table 2, the proposed method performs better in almost all EuRoC datasets scenarios. Figure 8 shows the trajectory comparison of the three algorithms in the industrial factory scene of sequence MH_01_easy and the indoor room scene of sequence V1_03_difficult. Compared with PL-VINS, the trajectory accuracy in all scenarios of EuRoC datasets in this paper has smaller errors, especially in difficult scenarios. In all easy scenarios, the RMSE of the proposed PLI-VINS is 0.083, PL-VINS is 0.107, and VINS-Mono is 0.155, respectively, and the trajectory accuracy is improved by 46.5% and 30.9%. However, in difficult scenarios, the trajectory accuracy of the proposed PLI-VINS is improved by 41.3%, while PL-VINS is only 21.1%. It is not difficult to find by referring to the trajectory comparison diagram in Figure 8a,b that in difficult type scenes, the trajectory accuracy of the PLI-VINS is improved more.  By comparing the three groups of tracks in Figure 7, it can be observed that the proposed method shows better accuracy and stability in the area where the camera has a   Figure 6 and the operation of the PLI-VINS in the three scenarios, it is not difficult to find that although V1_01_easy and V2_01_easy are indoor scenarios with relatively single environments and limited ability to describe the structural features of line features, this paper improves the quality of line segments by eliminating redundant line segments and merging lines segments; still achieved good trajectory accuracy. However, the MH_03_medium factory scene has a large number of good structural line segment features, which is very conducive to the PLI-VINS to improve the camera trajectory accuracy by using line features. It also shows that the proposed PLI-VINS performs well in various indoor environments. In terms of root mean square error of absolute trajectory error, as shown in Table 2, the proposed method performs better in almost all EuRoC datasets scenarios. Figure 8 shows the trajectory comparison of the three algorithms in the industrial factory scene of sequence MH_01_easy and the indoor room scene of sequence V1_03_difficult. Compared with PL-VINS, the trajectory accuracy in all scenarios of EuRoC datasets in this paper has smaller errors, especially in difficult scenarios. In all easy scenarios, the RMSE of the proposed PLI-VINS is 0.083, PL-VINS is 0.107, and VINS-Mono is 0.155, respectively, and the trajectory accuracy is improved by 46.5% and 30.9%. However, in difficult scenarios, the trajectory accuracy of the proposed PLI-VINS is improved by 41.3%, while PL-VINS is only 21.1%. It is not difficult to find by referring to the trajectory comparison diagram in Figure 8a,b that in difficult type scenes, the trajectory accuracy of the PLI-VINS is improved more.

Conclusions
In this paper, a visual-inertial SLAM algorithm based on point-line feature fusion for various indoor environments is proposed. Compared with the visual inertial SLAM algorithm based on point features, the proposed PLI-VINS uses the combination of point and line features to increase the robustness of the visual inertial SLAM system. This PLI-VINS is built on VINS-Mono and evaluated using EuRoC datasets. Different from the existing work, the PLI-VINS makes use of the advantages of different features and

Conclusions
In this paper, a visual-inertial SLAM algorithm based on point-line feature fusion for various indoor environments is proposed. Compared with the visual inertial SLAM algorithm based on point features, the proposed PLI-VINS uses the combination of point and line features to increase the robustness of the visual inertial SLAM system. This PLI-VINS is built on VINS-Mono and evaluated using EuRoC datasets. Different from the existing work, the PLI-VINS makes use of the advantages of different features and sensors, and effectively integrates point, line and IMU data by improving the quality of the extracted line features, thus improving the robustness and accuracy of the system. A comparison with the existing similar work shows that this paper can achieve the highest accuracy in most indoor situations.
In the future, this paper will improve the system by looking for more methods to constrain between 3D lines, and introduce line features into the initialization process, or effectively add line features into the word bag model and dense map of point and line features. These works will further improve the system, will be more suitable for indoor environments, improve the accuracy of camera motion trajectory estimation and the stability of system operation.